{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Grid Search for Binary Classification\n",
    "\n",
    "This notebook goes through applying the `GridSearch` algorithm in Fairlearn to a binary classification problem, where we also have a binary protected attribute. This algorithm comes from the paper [\"A Reductions Approach to Fair Classification\" (Agarwal et al. 2018)](https://arxiv.org/abs/1803.02453). The grid search is a simplified version of the full algorithm (appearing in section 3.4), which works best for binary classification and a binary protected attribute.\n",
    "\n",
    "The specific problem we consider a biased credit scoring problem. We assume that we have a collection of individuals characterised by two features - a credit score in the range $[0, 1]$ and a binary protected attribute from ${a_0, a_1}$. We also have a binary 'score' for each individual indicating whether or not they got a loan. This is determined by applying a threshold to their credit score, and to make the dataset unfair, we can set different thresholds for the two groups $a_0$ and $a_1$.\n",
    "\n",
    "In this simple case, we make the protected attribute and the credit score available to both the estimator and the model. This gives us a straightforward method of assessing fairness - look at the model, and see if zero weight is put on the protected attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fairlearn.reductions import GridSearch\n",
    "from fairlearn.reductions import DemographicParity\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn.linear_model import LogisticRegression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the following function to create the input data. The credit scores for each population are uniformly distributed in the range $[0, 1]$, and we apply separate thresholds to each subpopulation to determine the score $Y \\in {0, 1}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def simple_threshold_data(number_a0, number_a1,\n",
    "                          a0_threshold, a1_threshold,\n",
    "                          a0_label, a1_label):\n",
    "\n",
    "    a0s = np.full(number_a0, a0_label)\n",
    "    a1s = np.full(number_a1, a1_label)\n",
    "\n",
    "    a0_scores = np.linspace(0, 1, number_a0)\n",
    "    a1_scores = np.linspace(0, 1, number_a1)\n",
    "    score_feature = np.concatenate((a0_scores, a1_scores), axis=None)\n",
    "\n",
    "    A = np.concatenate((a0s, a1s), axis=None)\n",
    "\n",
    "    Y_a0 = [x > a0_threshold for x in a0_scores]\n",
    "    Y_a1 = [x > a1_threshold for x in a1_scores]\n",
    "\n",
    "    Y = np.concatenate((Y_a0, Y_a1), axis=None)\n",
    "\n",
    "    X = pd.DataFrame({\"credit_score_feature\": score_feature,\n",
    "                      \"example_sensitive_feature\": A})\n",
    "    return X, Y, A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now use the above function to generate our dataset. We have 31 individuals with label $a_0$, and they only require a credit score of 0.2 to get the loan. In contrast, the 21 members of the $a_1$ population require a score of 0.7. The label values for $a_0$ and $a_1$ have to be numeric, but the actual values are not especially important. We set them to 2 and 3, to reduce fiddling with `sklearn`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_samples_a0 = 31\n",
    "num_samples_a1 = 21\n",
    "\n",
    "a0_threshold= 0.2\n",
    "a1_threshold = 0.7\n",
    "\n",
    "a0_label = 2\n",
    "a1_label = 3\n",
    "\n",
    "X, Y, A = simple_threshold_data(num_samples_a0, num_samples_a1, a0_threshold, a1_threshold, a0_label, a1_label)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following helper function plots the score $Y$ for each of the subpopulations in our data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "def plot_data(Xs, Ys):\n",
    "    labels = np.unique(Xs[\"example_sensitive_feature\"])\n",
    "    \n",
    "    for l in labels:\n",
    "        label_string = str(l.item())\n",
    "        mask = Xs[\"example_sensitive_feature\"] == l\n",
    "        plt.scatter(Xs[mask].credit_score_feature, Ys[mask], label=str(\"Label=\"+label_string))\n",
    "        plt.xlabel(\"Credit Score\")\n",
    "        plt.ylabel(\"Got Loan\")\n",
    "        \n",
    "    plt.legend()\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plotting the input data, we can clearly see the bias against the $a_1$ population."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_data(X, Y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this notebook, we use a simple binary classifier from `sklearn`. We can train this model on our biased data, and look at the weights the model places on each feature. The fact that the `sensitive_features` has non-zero weight (the second entry in the `coef_` array) tells us that we have a biased model (note that measuring fairness in this way is only valid for this simple example notebook - in the real world, fairness is more complicated)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "unfair_model = LogisticRegression(solver='liblinear', fit_intercept=True)\n",
    "unfair_model.fit(X, Y, sample_weight=np.ones(len(Y)))\n",
    "\n",
    "unfair_model.coef_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also plot out the predictions for this model. We can see that a few points have changed (which is not unexpected) but the bias definitely remains."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y_predict_unfair = unfair_model.predict(X)\n",
    "plot_data(X, Y_predict_unfair)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reducing Unfairness with Grid Search\n",
    "\n",
    "Now, we move on to attempting to reduce the unfairness in our model using the grid search. This tries a series of different models, parameterized by a Lagrange multiplier $\\lambda_i$. For each value of $\\lambda$, the algorithm reweights and relabels the input data, and trains a fresh model ($\\lambda=0$ corresponds to the unaltered case).\n",
    "\n",
    "The grid search acts like a normal `sklearn` estimator, implementing `fit()` and `predict()` methods. The `fit()` method performs the grid search, and the best model found (according to a specified selection rule) is used in `predict()` calls. However, after `fit()` is called, there are several extra properties on the estimator. \n",
    "- for each grid point we have an entry in the following collections: `lambda_vecs_`, `objectives_`, `gammas_`, `predictors_`.\n",
    "- `best_idx_` indicates the index of the grid point where grid search found the best solution. The `predict()` method uses the corresponding predictor.\n",
    "\n",
    "We start by telling the algorithm that we want to try 7 different values of $\\lambda$ (which are generated for us)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_sweep=GridSearch(LogisticRegression(solver='liblinear', fit_intercept=True),\n",
    "                       constraints=DemographicParity(),\n",
    "                       grid_size=7)\n",
    "\n",
    "first_sweep.fit(X, Y, sensitive_features=A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can examine the values of $\\lambda_i$ chosen for us:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lambda_vecs = first_sweep.lambda_vecs_\n",
    "lambda_vecs[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is rather more than a single value $\\lambda$, so what's going on? These results are the outputs of the `Moment` type which drives the reduction approach to removing disparity. There are actually four Lagrange multipliers here, indexed by a tuple (sign, grp, group_id). The 'group_id' field corresponds to the labels $a_0$ and $a_1$, while the 'grp' field is the same in all cases (this is because we have specified Demographic Parity as our disparity criterion). Finally the 'sign' comes from the reductions approach specifying separate multipliers for violations of the disparity criterion from above and below. Both of these are constrained to be positive.\n",
    "\n",
    "So we have four multipliers - $\\lambda_{(+,2)}$, $\\lambda_{(-,2)}$, $\\lambda_{(+,3)}$ and $\\lambda_{(-,3)}$. Without losing generality, we can decide to modify one of these, but not the other, and the `DemographicParity` object we passed to the `GridSearch` constructor chose to make $\\lambda_{(+,3)}=\\lambda_{(-,3)}=0$. Finally, we can combine the 'above' and 'below' multipliers for the other group and obtain $\\lambda_i = \\lambda_{(+,2)} - \\lambda_{(-,2)}$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "actual_multipliers = [lambda_vecs[col][(\"+\", \"all\", 2)]-lambda_vecs[col][(\"-\", \"all\", 2)] for col in lambda_vecs]\n",
    "actual_multipliers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can look at how the weight the models place on the protected attribute (recall that in the fair case, this would be zero) varies with $\\lambda_i$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_sweep_sensitive_feature_weights = [\n",
    "            predictor.coef_[0][1] for predictor in first_sweep.predictors_]\n",
    "plt.scatter(actual_multipliers, first_sweep_sensitive_feature_weights)\n",
    "plt.xlabel(\"Lagrange Multiplier\")\n",
    "plt.ylabel(\"Weight of Protected Attribute in Model\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can take a look at the $\\lambda$ of the best model found in this sweep, along with the coefficients applied to the features. The weight on the protected attribute is now smaller:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lambda_best = first_sweep.lambda_vecs_[first_sweep.best_idx_][(\"+\", \"all\", 2)] - first_sweep.lambda_vecs_[first_sweep.best_idx_][(\"-\", \"all\", 2)]\n",
    "print(\"lambda_best =\", lambda_best)\n",
    "print(\"coefficients =\", first_sweep.predictors_[first_sweep.best_idx_].coef_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also generate predictions from this model. When we plot them, we see that we're much closer to having a fair model, with the two thresholds closer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y_first_predict = first_sweep.predict(X)\n",
    "plot_data(X, Y_first_predict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can do better. For this simple case, we can search $\\lambda$ values around that of the best model selected in the first sweep:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "second_sweep_multipliers = np.linspace(lambda_best-0.5, lambda_best+0.5, 31)\n",
    "\n",
    "iterables = [['+', '-'], ['all'], [a0_label, a1_label]]\n",
    "midx = pd.MultiIndex.from_product(iterables, names=['sign', 'event', 'group_id'])\n",
    "\n",
    "second_sweep_lambdas = []\n",
    "for l in second_sweep_multipliers:\n",
    "    nxt = pd.Series(np.zeros(4), index=midx)\n",
    "    if l < 0:\n",
    "        nxt[(\"-\", \"all\", 2)] = abs(l)\n",
    "    else:\n",
    "        nxt[(\"+\", \"all\", 2)] = l\n",
    "    second_sweep_lambdas.append(nxt)\n",
    "    \n",
    "multiplier_df = pd.concat(second_sweep_lambdas,axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can use these multipliers in another sweep:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "second_sweep=GridSearch(LogisticRegression(solver='liblinear', fit_intercept=True),\n",
    "                        constraints=DemographicParity(),\n",
    "                        grid=multiplier_df)\n",
    "\n",
    "second_sweep.fit(X, Y, sensitive_features=A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once more we can plot the weight placed on the protected attribute as a function of $\\lambda$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "second_sweep_sensitive_feature_weights = [\n",
    "            predictor.coef_[0][1] for predictor in second_sweep.predictors_]\n",
    "plt.scatter(second_sweep_multipliers, second_sweep_sensitive_feature_weights)\n",
    "plt.xlabel(\"Lagrange Multiplier\")\n",
    "plt.ylabel(\"Weight of Protected Attribute in Model\")\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can look at the $\\lambda$ and weights placed on each feature in the best model found from this sweep:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lambda_best_second = second_sweep.lambda_vecs_[second_sweep.best_idx_][(\"+\", \"all\", 2)] \\\n",
    "                     -second_sweep.lambda_vecs_[second_sweep.best_idx_][(\"-\", \"all\", 2)]\n",
    "print(\"lambda_best =\", lambda_best_second)\n",
    "print(\"coefficients =\", second_sweep.predictors_[second_sweep.best_idx_].coef_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And finally, we can obtain a fresh set of predictions from this model. We can see that the threshold is converging on the value 0.2, which was originally specified for just the $a_0$ population:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y_second_predict = second_sweep.predict(X)\n",
    "plot_data(X, Y_second_predict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}